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Abstract 

We present an efficient algorithm for computing the permanent for matrices of size N that 
can written as a product of L block diagonal matrices with blocks of size at most 2. For fixed 
L, the time and space resources scale linearly in N, with a prefactor that scales exponentially 
in L. This class of matrices contains banded matrices with banded inverse. We show that 
such a factorization into a product of block diagonal matrices gives rise to a circuit acting on a 
Hilbert space with a tensor product structure and that the permanent is equal to the transition 
amplitude of this circuit and a product basis state. In this correspondence, a block diagonal 
matrix gives rise to one layer of the circuit, where each block to a gate acting either on a single 
tensor component or on two adjacent tensor components. This observation allows us to adopt 
matrix product states, a computational method from condensed matter physics and quantum 
information theory used to simulate quantum systems, to evaluate the transition amplitude. 

1 Introduction 

For an arbitrary matrix A £ M^vxaKC), its permanent is defined by 

N 

per(A) = IlA-M' (!) 

where Sn denotes the permutation group of the set {1, . . . , N}. The fastest known exact algorithm 
for computing the permanent of general matrices is due to Ryser |16j . with running time 0(N2 ). 
Valiant showed that computing the permanent is #P-hard [23J. Therefore, it is unlikely that 
there exists an efficient algorithm and attention has been given to approximating the permanent 
or computing it exactly only for restricted classes of matrices with special structure. 

We start by reviewing the approximation results. For an arbitrary matrix A 6 Mn x n(C), 
Gurvits [8] showed how to efficiently obtain an additive approximation p € C such that 



\p - pei(A)\ <s-\\A 
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holds with high probability, where ||^4|| is the matrix norm given by the largest singular eigenvalue 
of A. The running time scales polynomially in N and 1/e. 

For an arbitrary matrix A having only non-negative numbers as entries, Jerrum and Sinclair 
[9] showed that it is possible to obtain a multiplicative approximation of the permanent. More 
precisely, for e > 0, it is possible to obtain p such that 

\p — per(^4)| < e • per(^4). 

The running time scales polynomially in n and 1/e. A significantly faster algorithm for this problem 
was presented in [2]. 

In this article, we present a polynomial time algorithm for exactly computing the permanent for 
a new class of matrices. We give a brief overview of some classes of matrices for which algorithms 
are known whose running time is polynomial or faster than that of Ryser's algorithm. 

For any matrix with at most cN nonzero entries, the permanent can be computed in time 
0*((2 — e) N ) time, with e depending on c |19j . 

Notable classes of matrices with special structure are those of circulant and Toeplitz matrices. 
Some explicit solutions or recurrence relations for some (0, 1) circulant matrices are given in |12t 
[I3j EH ES] • Other (0, 1) circulant and very sparse Toeplitz matrices are discussed in [5]. 

Efficient algorithms can also be found for particular classes of banded matrices. A matrix A 
is banded with bandwidth w whenever a%j = for \i — j\ > w. A banded matrix with bandwidth 
w = 1 is called a tridiagonal matrix. For a tridiagonal matrix, the permanent can be computed 
efficiently with the help of a recurrence relation [7j. For a Toeplitz matrix A of bandwith w, there 

is an algorithm that computes per (A) in time O ^( 2 ^) 3 log A^^ |18j . 

A further class is that of matrices whose permanent may be computed by the determinant of a 
matrix of the same size. This was established in the work of Temperley and Fisher [22] and in the 
work of Kasteleyn [10\ 111] , which were motivated by a problem in statistical mechanics. 

The key ideas underlying the algorithm presented here are motivated by methods used in 
quantum information theory. We express the permanent as the transition amplitude in a quantum 
circuit and apply matrix product states to evaluate this amplitude. A further example where matrix 
product states has been applied successfully to problems outside the field of quantum information 
theory is given in [4]. The authors present an improved algorithm for counting the number of 
satisfying inputs of certain classical circuits. 

We obtain an efficient algorithm for the new class of matrices of block factorizable matrices: 

Main result: Assume that for A € Mn x n(C) we have A = F1F2 ■ ■ ■ Fl, where each of the factors 
Fi is block diagonal with 2 by 2 or 1 by 1 blocks. Given such factorization, the permanent of A 
can be computed in time 0(N2 3L ) and space 0(N2 2L ). 

Observe that any matrix A that posses such factorization must be banded with bandwidth L. 
Further, if A is invertible, then the inverse A -1 must also be banded with bandwidth L. 

The problem of decomposing banded matrices with banded inverses into block diagonal matrices 
was studied by Strang [21, 20J. The results therein show that any invertible matrix A can be factored 
into L = 2w(2w + 1) such factors provided that A and A -1 are both banded with bandwidth w. 
We later describe briefly how such decomposition can be computed efficiently. 

A problem which makes the computation of the permanent such a difficult task is the absence 
of simplification rules present for the determinant. The determinant already by construction obeys 
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a multiplication rule, i.e. det(^4-B) = det(^4) det(-B). This rule does not hold for the permanent 
of a matrix product. However, the permanent obey the following simplification rules involving 
permutation matrices P and diagonal matrices D 

per(PAD) = per(P) per(A) per(D) and per(PAP') = per(^). (2) 

These rules allow us to use the Cuthill-McKee algorithm to reduce the bandwidth of the matrix A 
[6] without changing its permanent. Recall that the Cuthill-McKee algorithm suitably permutes 
the rows and columns of the sparse matrix to reduce its bandwidth. 

2 Permanent as the transition amplitude in a quantum circuit 

2.1 Multivariate polynomials and the permanent 

We now derive a new alternative expression for the permanent making use of multivariate polyno- 
mials. The importance of this expression is that our algorithm can be used to directly compute 
this expression. The idea for this expression is motivated by the appearance of the permanent in 
quantum optics. 

Let R = C[Xi, . . . , Xjv] be the polynomial ring in the N variables X±, . . . , Xn over the field 
C of complex numbers. The variable X% corresponds to the creation operator a\ of a photon in 
the ith mode. It is well-known that the permanent of a unitary matrix U can be expressed as the 
amplitude (1|3>[/|1), where |1) is the Fock state in which there is exactly one photon in each of the 
N modes, that is, |i) = a\ ■ ■ ■ ajy|vac) and <&u is the unitary describing the transformation on the 
Fock space induced by the transformation (a\, . . . , a* N ) i->- (oj, . . . , a^ N )U of the creation operators. 
This was observed in (TTJ Q] . 

We abstract from the specific details pertaining to quantum optics and rely solely on multivariate 
polynomials to derive the new expression. This approach is more straightforward and better suited 
to obtaining and analyzing our algorithm. 

Definition 1. Let R = C[Xi, . . . ,Xn]. To an arbitrary matrix A G Mjv X 7v(C), we associate the 
unique map <$>a : R ^ R by defining its action on the variables X±, . . . , Xn by 

N 

$ A (X i ) = J2 A ijX j fori = l,...,N 

and lifting it to all polynomials by requiring that it acts additively and multiplicatively with respect 
to polynomial addition and multiplication, respectively, that is, 

^ A {p + q) = ^A{p) + ^A{q) and ■ q) = * A (p) • $ A (q) 

for all p,q G i?0 

Observe that & A preserves the total degree of polynomials. Moreover, homogeneous polynomials 
are mapped onto homogeneous polynomials. To express this more formally, we introduce the 

$a is an endomorphism of the ring R. 
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grading of R with respect to total degree d > 0: 

= |n=(n 1 ,...,n JV )GN JV |^n J = d|, (3) 

= y c « x "> ^ 

where we use the shorthand notation X n = X^X^ 2 ■ ■ ■ X N N . We have $> A {R {d) ) C for d > 0, 
with equality iff A is invertible. 

Lemma 1. The permanent of A is related to &a as follows: 

<t> A {X 1 X 2 ---X N )=pei(A)X 1 X 2 ---X N + Y, CnXn - ^ 

neVW\(l,l,-,l) 

Proof. Consider the homogeneous polynomial 

<^ A (X 1 X 2 ... Xn )=(y M, h X h ] ( Y A *,h X h ) ■ ■ ■ ( Y A ^nX 3n 
\ii=i / \i2=i / \3n=i 

By inspection we see that the coefficient of X\X 2 ■ ■ ■ Xjy in <& A {X\X 2 ■ ■ ■ Xn) is equal to per(^4). □ 

Lemma 2. For B,C € M/v X jv(C), we have 

Proof. Assume A = BC so that Aij = J2k=i ^ikCkj for i,j = l,...,N. We have 

N N 

j=l j,k=l 
N n N 

= Y, B ' ik '^2 CkjXj = Y, Bik^c(X k ) 

k=l j=l k=l 
N 

= ^c(Y B ^ x k) = <M<M^)) 

k=l 

= ($ C o$ B )(X t ). 

We carried out this calculation explicitly to emphasize that the map A i-> $> A reverses the order, 
that is, &bc = &c and not <&b ° as one might expect. □ 

2.2 Factorization of A and the corresponding quantum circuit 

We now show that the permanent can also be expressed as the transition amplitude of of a circuit 
acting on a Hilbert space with a tensor product state. This allows us to use the method of matrix 
product states to compute the permanent. To this end, we associate the Hilbert space 

"H = span{|n) | n £ M (N) } 
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to the set R( N ^ of homogeneous polynomials of total degree N. For n € J\f( N \ the monomial X n 
corresponds to the basis state \n). By construction, the map $a is a linear map on this Hilbert 
space. We use |1) to denote the state |1) ® |1) ® ■ ■ ■ ® |1), which corresponds to the monomial 
X\X 2 ■ ■ ■ Xn. 

It is important for our algorithm that the Hilbert space % can be embedded into the tensor 
product Hilbert space 



(C JV+1 )® JV = span{|m) ® • • • ® \n N ) | < m < N for i = 1, . . . , 7v}. 



(6) 



Since (t£ N + l ')® N J^ag a tensor product structure, we can naturally consider circuits with the obvious 
notion of the two-qudit and single qudit gates. We restrict the two qudit gates to act on adjacent 
qudits only. 

Lemma 3. Let F <G Mjv X jv(C). If F has the form 

( 1 



1 



V 



1 / 



where the lxl block (a) acts on the subspace spanned by \rtk) ofC^ N+1 \ then $f is a single qudit 
gate 

I <8) ■ ■ ■ <8) I ®G' k ® I ® • • • ® I, 

v v ' v V ' 

fc-1 AT-ifc 

where G' k acts as diagonal matrix on C N+1 . 
If F has the form 

( 1 



a b 
c d 



where the 2 by 2 block 



a b 

c d 



1 7 



acts on the two qudit subspace spanned by \n k ) ® \n k +i) in 



(C N+1 )® 2 , then ® F is a two qudit gate 



I®G 



k,k+l 



fc-1 



AT-fc-1 



5 



where G^k+l ac t s on the tensor components k and (k + 1) of (C and is block- diagonal with 

respect to the subspaces 

S(m) = span{|n') ® \n") \ n' + n" = m} C C N+1 ® C^ 1 . 

Proof. Let us first assume, that F only has a single lxl block given by a in the k'th entry. According 
to definition H the map <S> F acts as <^ F (X^X^ 2 . . . X^ k . . . X™ N ) = a nk X^X^ 2 . . . X^ k . . . X n N N on 
the monomials. Due to the linearity of <& F this can be extended to the full space (C Ar+1 )® Ar . The 
action of the map Q F can be described by ® G' k ® I® n_fc 5 with 

N 

G' k = ^a n *\n k )(n k \. (7) 

n k =0 

This gate only acts on a single site. 

Furthermore, let us consider the action of a 2 x 2 block f ° ^ ^ , on the variables A~fc and 
.Xfc.4.1. If we assume that the F acts trivially on the remaining variables, we have that 

$ F (^i ni • • • K'K+i • • • X N N ) = £ ("*) ( nk+1 )a rk b nk - rk c n ^- r ^d r ^ x 



rfc=0r fe+1 =0 

Aj ■ -.A fe A fc+1 . ..Ajv . 1«J 



Again, by making use of the linearity of <& F , we can extend the action of this map to the full 
Hilbert space. This map acts on the state as the tensor product (g) G^^+i ® I®" - k -1 . For 

completeness, we state the gate Gk,k+i explicitly, 



2N n m n—m / \ / \ 

<w = EEEEl n ™ W^c"— <f x 

n=0m=0r=0 s=0 ^ ' ^ ' 

| r + n — m — s) | m — r + s) (m | (n — m \ . (9) 

It follows that this gate has to preserve the total degree of the variables X k and X k+ \, since by 
definition $ F (X% k X%!$ 1 ) = <S> F {X k ) nk <5> F (X k+1 ) nk +K This leads to a block diagonal structure of 
the gate G k) k+\- Each block is labeled by the total degree, or particle number, nfc+nfc+i- Note, that 
this is why we only consider the sum of n = 0, . . . , 2N in the representation in equation ([9]). □ 

Corollary 1. A decomposition of A of the form A = F\F>i - ■ ■ Fl, where each factor F% is block- 
diagonal with blocks of size 1 by 1 or 2 by 2, leads to a decomposition of &a = $F L " " " $f 3 $Fi o,s a 
quantum circuit of depth L. The blocks of each factor Fi correspond to gates that can be executed 
in parallel, that is, each Fi corresponds to one layer §> Fi . 

2.3 Decomposition of banded matrices with banded inverses 

We now show that the class of block factorizable matrices contains banded matrices with banded 
inverses, which were studied in the literature [21} 120] . 
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Ideally, we would like to have sufficient and necessary conditions characterizing matrices block 
factorizable matrices A, that is, those having decompositions of the form 

A = F X F 2 • • • F L , 

where each Fi is block-diagonal with 2 by 2 or 1 by 1 blocks. 

It is easy to see that A must be banded with bandwidth L to possess such decomposition. 
Furthermore, if A is invertible then A -1 must also be banded with bandwidth L since 

A-^F^-.-F^Fr 1 . 

The results on banded matrices with banded matrices due to Strang lead to the following theorem. 

Theorem 1. Suppose A and A -1 have bandwidth w. Then A is a product F\- ■ ■ F^ of block- diagonal 
matrices Fi with 2 by 2 or 1 by 1 blocks and L < 2w(2w + 1). 

The proof of consists of two steps: 

• Factor A into BC with diagonal blocks of size w, 2w, 2w, . . . for B and 2w, 2w, . . . , 2w for C. 
The shift between the two sets of blocks means that A = BC need not be block diagonal. 

• Break B and C separately into factors F with blocks of size 2 or 1. 

The method for obtaining the decomposition A = BC in the first step is described in |2 1 1. Section 
2]. The special case of orthogonal matrices is treated in |20t Theorem 2.1]. 

For the second step, we use the result that any matrix of size n by n can be decomposed into 
a product of n(n + l)/2 matrices such that each of these matrices contains only one 2 by 2 block. 
This is described in |20t Lemma 2.2]. Applying this result independently to each block of B and C 
we conclude that B and C can be decomposed into a product of w(2w + 1) block diagonal matrices 
with 2 by 2 and 1 by 1 blocks. We see that L < 2w(2w + 1). 

3 Evaluation of circuits based on matrix product states 

A naive evaluation of the circuit for the permanent would immediately yield a cost which scales at 
least as unfavorably as 0((N + 1)^), when keeping track of all the components of the vector in the 
tensor product Hilbert space T-L in eq. ([6]). However, for a circuit of small depth such as the one 
in Corollary [H a significantly more compact description of the resulting state can be achieved. A 
widely used description is the matrix product state (MPS) |24 } 127 } [26] . 

Definition 2. A matrix product state (MPS) on some tensor product space \ ip) € ^d+l^®N i s a 
state written in the form 

d 

|V>= £ Bgj3g...B£J|n 1 ...n*). (i ) 

ni,...,njv=0 

For k = 1,...,N, the matrices B^l € Mo fc lX £> fc (C) are rectangular with the convention Dq = 

Dn+i = 1- The first matrix Bn\ € Mi x d 1 (C) is a row vector and the last matrix B^j € 
Md jv _ iX i(C) is a column vector. The physical index takes values in {0,1,..., d}. The size 

of the MPS tensor Bnl is described by the triple (-D&_i, D^, d). 
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We associate a matrix B^l to each local basis element \n^}. The coefficient for each basis 
element | n\ . . . n^) is obtained as the product of the corresponding matrices. It is often very 
convenient to visualize the MPS in terms of the pictorial representation given in Fig. [TJ In this 
representation, each closed line corresponds to an index that has to be contracted. An open line 
represents a local basis element. 



B [k] 
' — • — P 



a,(3 



B m B m B im 

T — T — r.-T — T 



(A) 



m n 2 



(B) 



n N 



Figure 1: (A) The figure shows a graphical representation of the matrix product state tensor as 



B [k] 



a,0 



The three 



defined in (I10p . Each black dot corresponds to a matrix with elements 

indices (a, f3, nt) are represented in terms of three black lines. (B) The coefficient for each basis state 
is obtained by contracting the virtual indices. This contraction of the a and (3 indices is depicted 
as the closed center line. The full picture corresponds to the matrix product Bn\Bn\ ■ ■ ■ fil^. The 
remaining indeces {n^} are left open. This corresponds to the indeces of the tensor component 
which is supported on the local Hilbert space C d+1 . 



A subsequent application of the singular value decomposition between all bipartitions shows 
that any state in the Hilbert space (C rf+1 )® Ar posesses such a matrix product state representation 
[27] . For the sake of completeness, we repeat the proof here. 

Lemma 4. Any state \ ip) = Ylnx n N =o c n!...n N \ n\ . . . tin) £ (C d+1 )® Ar possesses a matrix product 



representation of the form I110\) . The maximum matrix dimension D max = maxj. is always 
bounded from above by D max 

Proof. Note that every state on a bipartite Hilbert space | tp) £ Hi <8> H2 possesses a so called 
Schmidt decomposition. That is, we can write 

di d,2 min(di,d2) 

\^) = S }Z S ^Z c ij\ i 3) = ^2 a a\fa}\^a) ■ (11) 
i=0 j=0 a=0 

We made use of the canonical singular value decomposition (SVD) [3] for rectangular matrices. 
For the components of the matrix c we have Cij = Ui tQ . a a V j jQ with a a > 0. Furthermore, the 
matrices U = Yl a b Ua,b I a ) {b \ and V = J2 a b ^a,,b I a ) (b \ are unitary. The so-called Schmidt vectors 
are given by | (p a ) = Y2i Ui, a \ i) as well as | ip a ) = Ylj Vj,a \j}- They form orthonormal bases on 
the two sides of the bipartition due to the unitarity of U and V. 
To obtain the MPS representation, we proceed as follows: 

1. Consider the cut (rai), (722, . . . , n^) and perform a SVD of the matrix C( ni w n2 ___ njv ) with respect 
to this partition. The resulting state can be expressed as 
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2. The resulting Schmidt vector tpai '^J (with support on the space [2 . . . N]) depends on the 

indices cti, ni, . . . ,n^. Introduce the next cut between (at\, n-z), (n$, . . . , njv) and perform 
a further SVD along this partition. This yields the representation 



an 



n2) 



[3...N] 



L / («i 



n 2 ),a 2 "2 



a 2 



(13) 



3. We now proceed inductively by repeating step 2. After the final local space is reached, the 
resulting state is of the form 

l^-VVt/W a^U [2] o-Pl U [N ~ 1] a [N-i]yiN] I nAT \ 

I VI ~ U ni,ai a ai u ( ai ,n 2 ),a 2 a a2 ' ' ' U (a JV _ 2 ,niV-i),aiV-i "JV-l V «iv,ajv-i I 1 ' ' ' ,LN ' 1 

(14) 

We obtain the desired MPS representation by defining the entries of the MPS tensors by setting 

B [k]] -Tj\k] [fc] i W [Nh _y[N] 

Observe that the largest possible maximum dimension .D^ occurs for the cut at the center of the 
chain. In this case, the dimensions of the respective Hilbert spaces are bounded from above by (d + 
1)1^/2] f which then corresponds to the largest possible rank of the matrix S = Y^ a cr a \a) {a\. □ 

The matrix product state can be described with ^2!^ =l (d+l)DkDk+i < 0(NdD^ nax ) parameters. 
This can be considerably less than 0((d + 1) ) when the matrix-dimensions D^, also referred to as 
bond dimensions, are small. This scenario corresponds to a quantum state with little correlations 
among the qudits, which is said to possess a small amount of entanglement in quantum information 
theory. 

Matrix product states and the application of gates to the states can conveniently be depicted as a 
tensor network (Fig. [2]). The contraction of the entire tensor network corresponds to the evaluation 
of expectation values when all the lines are closed. In this picture, the permanent can be understood 
as the contraction of the tensor network that is generated by the circuit decomposition of <&a- 

During the evaluation of the circuit, it is desirable to retain the standard MPS form of the state 
to be computationally more efficient. A further result in [27] shows at which cost local updates 
of a general MPS can be computed. A single qudit gate acting the site k does not increase the 
Schmidt-rank of the MPS and can be applied to the local tensors individually. A two qudit 
gate acting the sites k and k + 1, however, does mix the neighboring tensors. The larger tensor 
resulting from this operation has to be decomposed again by a SVD, as depicted in Fig. [3l 

Lemma 5. For a matrix product state \ ift) in the Hilbert space (C d+1 )® N , the following update 
operations can be performed so that \ ip) remain in matrix product form. 

1. An update corresponding to a single local gate G' k at site k can be applied to the MPS \ if)) at 
costO({d+l) 2 D k D k+1 ). 

2. An update corresponding to a two qudit gate Gk,k+i acting two adjacent sites k, k + 1 can be 
applied at cost 0(mm((d + l)D k _i, (d + l)Z?fe + i) 3 ). 

Proof. Let | ip) = J2{n k } B n\ ■ ■ ■ Bn$ \ ni . . . n N ). 
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Ga 



i = 1 
i = 2 
i = 3 



n N 



Figure 2: A circuit, such as the one considered in Corollary [H can be understood as the contraction 
of a tensor network where the base corresponds to a matrix product state. We consider single qudit 
and two qudit operations acting on adjacent qudits. The former are represented by the gates G' k 
and the latter by the gates Gk,k+i- Each of these gates can be understood as local tensor in the 
tensor network. We have indicated the different layers % = 1, 2, 3 ... of the network. In each layer, 
the individual operations commute and the corresponding commuting gates can be performed in 
parallel. 



1. The single qudit gate G' k acts as 
!®(k-i) ®Q' k ® l N ~ k | V) : 



{n k },n' k 



[N] 

N 



ni 



n N 



(15) 



We see that it only modifies the kth MPS tensor and does not alter the other tensors. This 
update corresponds to a single tensor update that retains the overall structure of the matrix 



product state. The resulting tensor is given by 



B' [k ) 



a,/3 



T,n k \- G 'k\n{ 



:n k 



B [k] 



a,/3 



. The cost 



for this operation behaves as follows: for each pair a, j3 and each n' k , we have to perform a 
vector multiplication at the cost (d + 1). The resulting overall cost is 0((d + l) 2 D^Dk+i) 
since there are D k D k+ i pairs a and f3 and d + 1 indices n' k . 

2. The two qudit gate Gk,k+i acts on the state as 

E B[ nl ■ ■ ■ i G ^ kK+lU n k ,n k+l) B S B ^ ■ ■ ■ B S K ■ • " »Mh-1 -»»)■ ^ 
{n k },n' k ,n' k+1 

We see that it combines the two, initially independent, tensors Bn k Bn k+1 to the larger tensor 
of the form 



M [k,k+i] 

lv± n k ,n k+1 



E ^ 



(n k ,n k+1 ),(n' k ,n' k+1 ) 



B lk] B lk+l] 



n k ' n k +l 



"k+1 



(17) 



This tensor has (d + l) 2 D k ^iD k+ i elements. The contraction of this larger tensor can be 
computed in time of the order 0((d+ l) A Dk-iDk+i)- 

After the contraction, we have to cast the tensors back into the original MPS form as explained 
in Fig. [3l To achieve this, we regroup the indices into (a,nk), (nk+i,P) and perform a SVD 
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D k - U 




D, 



(C) 



D k+1 



Figure 3: The figure depicts the steps that are performed after an update of the MPS with a gate 
Gk,k+i acting on two adjacent sites. The update proceeds as follows: (A) First we contract the 
MPS tensors B^ and B^ k+1 ^ and apply the gate to the indices nfc,nfc + i. We are then left with a 
bigger tensor denoted by M^ k ' k+l \ This larger tensor has the components [M^'^+^J^ ^. (B) We 
then regroup the indices (a, n^), (n^+i,/?) and cast M^- k,k+1 ^ into matrix form [-^ fc ' fc+1 ](nfc,a),(n fc+lll S) 
so that we can perform a SVD. The resulting SVD yields the components [M fc,fc+1 l 



their entries 



(C) We then define the new MPS tensors -B'jfj and B 



by setting 



B 



/[*] 
>ik 



a, 7 



5 



/[fc+1] 



7,/3 



r 



(n fc+ i,/3),7> 



respectively. 



on the matrix [M^' fc+1 l] , . , ^ „. The resulting matrix is (d + l).Dfc_i x (d + 
dimensional. We are left with the decomposition 



[M 



k,k+l] 



\(n k ,a),(n k+1 ,p) 



^2 U (nk,<x),-y a -yV \n k+1 ,P),T 



(18) 



The SVD can be performed in 0(min((d+l)Dfc_ 1 , (<i+l)Z\. +1 ) 3 ) operations. Since we assume 
that min(-Djfc_i, -Dfc+i) > d + 1. The cost of the SVD outweighs the expense of constructing 
the tensor M^ k ' k+1 ^ and we therefore state this bound as the total runtime of this update. 

'by 



After the SVD has been performed, we define the new MPS tensors S'jjj and B 
setting their entries 



n k +i 



B' 



a,7 



U(n k ,a)^°'~f an d 



R /[fe+i] 

n k+ i 



1,P 



V 



(n fe+ i,/3),7; 



respectively. 



(19) 



□ 



In summary, the computational cost of contracting the tensor network in terms of time and 
space usage scales polynomially with the maximal bond dimension D max . Therefore, an evaluation 
of the circuit is feasible when this dimension remains small. This observation was made in[27j. 
Certain quantum circuits can be simulated by classical means, provided that the entanglement is 
low during the entire computation. We show in the next section that this is the case for the circuits 
in Corollary [H 
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4 Presentation of our algorithm and analysis of its time and space 
complexity 

To evaluate the circuit that corresponds to the permanent, we start the computation with the 
product state l) = 1 11 . . . 11). For this state, we have D max = 1. However, this dimension grows 
with each application of a new layer of &(A). In general, as we will see, the bond-dimension, 
and by that the resulting computational effort, can grow exponentially with the bandwidth of the 
matrix A. The following lemma states by how much each can increase when a single gate from 
Lemma [3] is applied. 

Lemma 6. Let &f denote a local gate as in Lemma\^ After the application of ^f to the adjacent 
MPS tensors and B^- k+1 \ with dimensions (D^-i, D^, d) and (.D^, D^i, d), respectively, the 
shared bond dimension behaves as follows: 

1. If ads on a single site k, then the dimensions D' k remains unchanged. 

2. If ^f ads on two adjacent sites k, k + 1, then the new shared bond dimension D' k between 
these sites is is bounded by D'k < min {(d + l)Dk-i, (d + l)Dk+i}- 

Proof. The operations for the local updates are described in the proof of Lemma [5j 

1. If the map §f acts on a single site k, then the new MPS tensor is updated according to 
_g/W _ [<&A\ n n ■ Therefore, the triple (.D&_i, D^, d) remains unchanged and the 
MPS structure of the state is preserved. 

2. If the map &f acts on the two adjacent sites k and k + 1, the tensor has to be updated 
according to M^+i = [^](n fc ,n fc+1 ),«,< +1 ) B nl B nj^- This tensor corresponds 
to the one depicted in Fig. [3j After regrouping the indices we are left with a matrix of 
dimension (d + l)Dfc_i x (d + l)Dk+i- Therefore the singular value decomposition of the 
matrix M k,k+1 can have at most the the rank min {(d + l)-D/c_i, (d + l)Dfc + i}, which gives 
the desired bound on D' k . 

□ 

Recall, that the gates G^^k+i and G' k are block diagonal, c.f. Lemma [3l Therefore, as we will 
show now, the physical dimension is at most d = 2 l after the application of i layers. This fact can 
be exploited to achieve a linear scaling with respect to ./V in the time and space requirements. 

Lemma 7. Assume we apply the first i layers of the circuit i.e. $F t ■ ■ ■ &Fi> corresponding 
to the decomposition A = F\ . . . Fi . . . Fl (c.f. Fig. // the initial state of the circuit is l) = 
| 11 ... 1), then the local degree, i.e. particle number, is always bounded by < d = 2 l at every 
site k. 

Proof. The gates of the circuit G = Q F k can be constructed from A = F\ . . . Fi . . . Fl as explained 
in Lemma [31 Recall, that the gates & F k have the property that they preserve the total degree of 

i 

the monomials X k X k r ^ 1 and therefore are themselves block diagonal. Here each block corresponds 
to a particular local particle number. The local degree d at a single site can increase when we have 
a gate that is supported on more than one site. Let us estimate by how much d has to grow at 
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each step. We consider the action of a set of commuting gates & F k on the basis states described 
by X™ 1 XI 2 . . . X™ N . We write w.l.o.g. 

G12G34 • • • Gjv_i i Ar(X™ 1 X% 2 . . . X^ N ) 

^ F i 2 {X 1 ) ni ^ F i 2 {X 2 ) n2 . . . $ F , N -i,i f (X N -i) n > f - 1 $ I!M -i,N(X N ) nN 

\ "AT 

\ Nr X r I . (20) 



\ r / \ r / \ r 



By counting the degree, we see that the highest degree for two adjacent sites, e.g. X^ and X^+i, is 
always given by + n^+i. Therefore, we have that the local dimension increases as d! = 2d. We 
initially start in the state | l) that corresponds to X\X 2 ■ ■ ■ X^. This state has a degree of d = 1 
at each site. We proceed to apply layer by layer and therefore have that d = 2 l after i layers. □ 

We are now ready to present the algorithm. The main procedure is written in pseudocode as 
stated in Table [TJ An informal description of the algorithm follows in the subsequent paragraph. 



1 

2 
3 
4 
5 

6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 



procedure (factorization OF A = F\ . . . F£) 

prepare MPS {B^} k=1 _ N=dX Td{A) in Bn\ = 5 1>nk for n k G {0, l,d = 2}; 
for i = 1 to L do 

for k = 1 block in F { = fe=1 do 
construct G = <& F k ; 

apply G to B^B^; 

compute SVD and recast in standard MPS form; 
k <— k + dim(F i fc ); 
end for 

if N > 2 i+1 then 

increase local MPS dimension d = 2 4+1 ; 
else 

set local MPS dimension d = (N+l); 
end if 

end for 

return per (A) = sj 1] sj 21 . . . B [ * ] ; 
end procedure 



Table 1: The algorithm computes the permanent of a matrix A £ Mn x n(C) given a factorization 
A = F\ . . . Fl as input. 

We assume that we are given a decomposition of the matrix A = F\ . . . Fl, where Fi = (J) fc=1 F^ . 
The blocks have size 2x2 and 1 x 1 as in Theorem [3l Each block corresponds to a local gate 
G = Q F k in the circuit. The circuit starts with the initial state | l). This state can be represented 

1 \k] \k] 

as a MPS with D = 1 by choosing the B[ = 1 for all k = 1 . . . N. All other components Bnl with 
rik > d = 1 of the MPS tensor are set to zero. Every layer corresponds to a single matrix F{. The 
different blocks in the matrix commute and correspond to gates that can be applied in parallel. 
We update each MPS tensor B^ as described in Lemma [5l This ensures that the state is still in 
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matrix product form after the application of the new layer. Before we apply the next iteration, we 
have to increase the local dimension d of the tensors to d = 2^ a y er . With each iteration the bond 
dimension D max will increase. Finally after the L layers have been applied, we proceed to compute 
the scalar product with the state | l). This can be done in time 0(N((d + l)D max ) 3 ) and amounts 

to computing the matrix product per (A) = B±B^ . . . B^. The full estimate of the runtime, as 
well as the space requirements are stated in the following theorem. 

Theorem 2. Assume that for A E MjVxV (C) we have A = F1F2 ■ ■ ■ Fl, where each of the factors 
Fi is block diagonal with 2 x 2 or 1 x 1 blocks. Given such factorization, the permanent of A can 
be computed in time 0(N2 3L ) and space 0(N2 2L ). 

Proof. Given the decomposition A = F± . . . Fl, we construct the corresponding circuit as outlined 
in Lemma [31 The central figure of merit is the bond dimension D max . Given the initial state | l), 
this dimension is D max = 1. However, with the application of each layer this dimension increases 
according to (c.f Lemma E|) D'k = min((d + [d + l)Dk+\). As we have outlined in the 

algorithm, the degree d does not stay constant but grows at each iteration (c.f. Lemma [7]), so that 
after i layers, we have d = 2*. Thus D max < Iif =l {2 % + 1) = 0(2 L ). The bound on D max can be 
used to give runtime and space bounds according to Lemma [5j We need to store a total of N MPS 
tensors each of size at most 0((2 L , 2 L , 2 L )). We suppress the linear scaling of L in the exponent. 
Therefore, in the worst case the space required scales at most as 0(N2 2L2 ). Furthermore, due to 
the runtime bound 0(min((d + l).Dfc_i, (d + l)-Dfc + i) 3 ) for the application of an individual gate, 
the total time is bounded by 0(N2 3L2 ). □ 

The runtime bound as stated in theorem [2] scales linearly in the matrix size N. The prefactor 
which is related to the bandwidth w of the matrix via w < L scales as 2 . However, observe that 
the highest dimension d can never exceed N, which is the total degree of the full initial polynomial. 
The aforementioned bound is therefore only a good estimate for the worst case runtime, as long as 
2 L < N. When 2 L > N one can easily see, following the same arguments as in theorem [21 that 
the runtime has to always be bounded by 0(N(N + l) 3 ( i+1 )). This bound is still polynomial in 
the matrix dimension. Furthermore, we would like to emphasize, that the stated bounds are only 
upper bounds for the worst case scenario for the algorithm. In practice it can occur that a large 
set of the singular values are zero after the gates Gk,k+i has been applied. 

5 Conclusions 

We have obtained a polynomial time algorithm for computing the permanent of block factorizable 
matrices. The ideas behind this algorithm are the expression of the permanent as the transition 
amplitude in a quantum circuit and the application of matrix product states to evaluate this 
amplitude. 

In the present work, we have limited ourselves to block factorizable matrices, that is, those with 
short factorizations into block diagonal matrices with blocks of size at most 2. These decompositions 
give rise to quantum circuit whose depths directly corresponds to the length of the factorization; 
the blocks of each factor correspond to quantum gates of one layer that can be executed in parallel. 
To the best of our knowledge, this is the first time that a multiplicative decomposition has employed 
successfully to obtain an efficient algorithm for computing the permanent of matrices. 
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In a future publication, we will show how to rely on different techniques from condensed mat- 
ter and quantum information theory such as matrix product operators |25j to obtain polynomial 
algorithms for computing the permanent of new classes of matrices. 
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